Skip to content

Conversation

@jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Jul 27, 2025

Summary by CodeRabbit

  • New Features
    • Added support for selecting different backends ("slurm" and "flux") and specifying PMI modes when running parallel tasks.
    • Introduced PMI mode configuration across executors and task schedulers for improved MPI compatibility.
  • Bug Fixes
    • Improved error handling for invalid backend selections during parallel task execution.
  • Tests
    • Introduced new tests to verify command generation for different backends, PMI modes, and core counts.
  • Documentation
    • Updated installation instructions to correct PMI mode parameter naming.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 27, 2025

Walkthrough

The changes extend the get_cache_execute_command function to support multiple MPI backends (mpiexec, srun, flux run) and an optional PMI mode parameter for specifying MPI interface options. This enhancement is propagated through related executors, schedulers, and spawners, with corresponding test updates and documentation corrections. The workflow is also updated to install MPI packages and run MPI environment checks.

Changes

File(s) Change Summary
executorlib/standalone/command.py Extended get_cache_execute_command with backend and pmi_mode parameters to support different MPI backends and PMI flags.
executorlib/task_scheduler/file/shared.py Updated execute_tasks_h5 to accept and forward pmi_mode to get_cache_execute_command.
executorlib/executor/flux.py Replaced deprecated flux_executor_pmi_mode parameter with unified pmi_mode in Flux executors and related functions.
executorlib/executor/slurm.py Added pmi_mode parameter to Slurm executors and factory function, propagating PMI mode settings.
executorlib/executor/single.py Renamed flux_executor_pmi_mode argument to pmi_mode in TestClusterExecutor initialization.
executorlib/standalone/inputcheck.py Renamed check_flux_executor_pmi_mode to check_pmi_mode with updated error message for PMI mode validation.
executorlib/task_scheduler/file/task_scheduler.py Added pmi_mode parameter to FileTaskScheduler and create_file_executor, replacing deprecated flux_executor_pmi_mode checks.
executorlib/task_scheduler/interactive/fluxspawner.py Renamed PMI parameter from flux_executor_pmi_mode to pmi_mode in FluxPythonSpawner.
executorlib/task_scheduler/interactive/slurmspawner.py Added pmi_mode parameter to SrunSpawner constructor and generate_slurm_command to append PMI flags to SLURM commands.
.github/workflows/pipeline.yml Added installation of mpich and MPI environment checks in Slurm test job; reordered unit test execution.
docs/installation.md Corrected parameter name from flux_executor_pmi_mode to pmi_mode in Flux executor installation instructions.
notebooks/3-hpc-job.ipynb Updated usage examples to rename flux_executor_pmi_mode parameter to pmi_mode in FluxJobExecutor instantiations.
tests/test_standalone_command.py Added comprehensive tests for get_cache_execute_command with various backends and PMI modes; tests for get_interactive_execute_command.
tests/test_fluxclusterexecutor.py Added pmi_mode parameter to FluxClusterExecutor instantiations in tests.
tests/test_fluxjobexecutor.py Renamed flux_executor_pmi_mode to pmi_mode in FluxJobExecutor test cases.
tests/test_fluxpythonspawner.py Renamed PMI parameter in Flux executor kwargs from flux_executor_pmi_mode to pmi_mode.
tests/test_slurmclusterexecutor.py Modified Slurm job template and added pmi_mode parameter to SlurmClusterExecutor tests.
tests/test_slurmjobexecutor.py Added MPI parallel test with SlurmJobExecutor using pmi_mode; updated serial test to include --mpi=pmi2.
tests/test_standalone_inputcheck.py Renamed test for PMI mode check from check_flux_executor_pmi_mode to check_pmi_mode.
tests/test_standalone_interactive_backend.py Updated test to include --mpi=pmi2 in Slurm command and pass pmi_mode to SrunSpawner.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant SharedPy
    participant CommandPy

    Caller->>SharedPy: execute_tasks_h5(file_name, cores, backend, pmi_mode)
    SharedPy->>CommandPy: get_cache_execute_command(file_name, cores, backend, pmi_mode)
    CommandPy-->>SharedPy: command_list
    SharedPy-->>Caller: command_list
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • liamhuber

Poem

🐇 Hopping through code with a joyful cheer,
Backends and PMI modes now appear!
From mpiexec to flux’s bright run,
Slurm joins in, the race is won.
Tests all green, commands aligned,
This bunny’s work is well-defined!
🚀✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch parallel_file_executor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
executorlib/standalone/command.py (1)

36-55: Consider refactoring for better maintainability.

The backend logic is functionally correct and handles all cases properly. However, the implementation is repetitive and could be improved for maintainability.

Consider refactoring to reduce code duplication:

-        if backend is None:
-            command_lst = (
-                ["mpiexec", "-n", str(cores)]
-                + command_lst
-                + [get_command_path(executable="cache_parallel.py"), file_name]
-            )
-        elif backend == "slurm":
-            command_lst = (
-                ["srun", "-n", str(cores)]
-                + command_lst
-                + [get_command_path(executable="cache_parallel.py"), file_name]
-            )
-        elif backend == "flux":
-            command_lst = (
-                ["flux", "run", "-n", str(cores)]
-                + command_lst
-                + [get_command_path(executable="cache_parallel.py"), file_name]
-            )
-        else:
-            raise ValueError(f"backend should be None, slurm or flux, not {backend}")
+        backend_commands = {
+            None: ["mpiexec", "-n", str(cores)],
+            "slurm": ["srun", "-n", str(cores)],
+            "flux": ["flux", "run", "-n", str(cores)],
+        }
+        
+        if backend not in backend_commands:
+            raise ValueError(f"backend should be None, slurm or flux, not {backend}")
+            
+        command_lst = (
+            backend_commands[backend]
+            + command_lst
+            + [get_command_path(executable="cache_parallel.py"), file_name]
+        )

This approach centralizes the backend configuration and makes it easier to add new backends in the future.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88f44ef and 8b2d260.

📒 Files selected for processing (3)
  • executorlib/standalone/command.py (2 hunks)
  • executorlib/task_scheduler/file/shared.py (1 hunks)
  • tests/test_standalone_command.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: unittest_slurm_mpich
  • GitHub Check: notebooks_integration
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
  • GitHub Check: unittest_openmpi (macos-latest, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
  • GitHub Check: unittest_flux_mpich
  • GitHub Check: unittest_flux_openmpi
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
  • GitHub Check: unittest_mpich (macos-latest, 3.13)
  • GitHub Check: unittest_win
🔇 Additional comments (7)
executorlib/standalone/command.py (2)

4-4: LGTM: Proper type import for optional parameter.

The Optional import is correctly added to support the new optional backend parameter type hint.


20-22: LGTM: Well-designed function signature enhancement.

The function signature properly adds the optional backend parameter with appropriate type hints and maintains backward compatibility. The docstring clearly documents the new parameter and its accepted values.

Also applies to: 29-29

executorlib/task_scheduler/file/shared.py (1)

157-157: LGTM: Proper parameter propagation.

The backend parameter is correctly passed through to get_cache_execute_command, enabling the backend-specific command generation functionality introduced in the command module.

tests/test_standalone_command.py (4)

1-6: LGTM: Clean test structure and imports.

The test imports are appropriate and the test class follows standard unittest conventions.


7-15: LGTM: Good coverage of interactive command generation.

The tests properly verify that the correct scripts are selected for serial vs parallel execution. Using path splitting to check script names is a good approach for cross-platform compatibility.


17-30: LGTM: Comprehensive serial execution testing.

The tests correctly verify that backend parameters have no effect on serial execution (cores=1), testing multiple backend values to ensure consistent behavior. This validates that the backend logic only applies to parallel execution.


32-57: Excellent comprehensive testing of the new backend functionality.

The parallel execution tests thoroughly validate all supported backends with complete command structure verification. The tests properly check:

  • Correct backend commands (mpiexec, srun, flux run)
  • Proper argument ordering and values
  • Error handling for invalid backends

This provides excellent coverage of the new functionality and will catch potential regressions.

@jan-janssen jan-janssen marked this pull request as draft July 27, 2025 09:22
@jan-janssen
Copy link
Member Author

It is surprising that the tests work with flux and mpich but they fail for flux with openmpi and slurm with mpich. The issue with flux and openmpi could be related to the pmix configuration, but then at least the slurm with mpich test should work as well.

@jan-janssen
Copy link
Member Author

Debug it on Binder

@codecov
Copy link

codecov bot commented Jul 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.47%. Comparing base (88f44ef) to head (3646610).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #760      +/-   ##
==========================================
+ Coverage   97.43%   97.47%   +0.03%     
==========================================
  Files          32       32              
  Lines        1403     1423      +20     
==========================================
+ Hits         1367     1387      +20     
  Misses         36       36              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jan-janssen jan-janssen force-pushed the parallel_file_executor branch from fdd9c5d to 2545358 Compare July 27, 2025 11:36
@jan-janssen
Copy link
Member Author

There seems to be a general issue with the SLURM tests, so fixing those is the next step.

* Debug slurm pmi options

* Update pipeline.yml

* disable MPI parallel test

* sinfo

* update output

* add mpi parallel test

* core validation is only possible on compute node not on login node

* fix test

* enforce pmix for testing

* Update slurmspawner.py

* downgrade to openmpi

* remove pmi setting

* use slurm environment

* Try pmix again

* Update slurmspawner.py

* Update slurmspawner.py

* Update pipeline.yml

* Update pipeline.yml

* Update slurmspawner.py

* Update pipeline.yml

* use slrum args

* extend test

* fix tests

* Add executor_pmi_mode option

* check all files

* fixes

* extend tests

* rename to pmi_mode

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jan-janssen jan-janssen marked this pull request as ready for review July 27, 2025 18:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
.github/workflows/pipeline.yml (1)

283-284: LGTM! Enhanced MPI environment setup and debugging.

The explicit installation of mpich and the addition of debugging commands (sinfo and srun --mpi=list) will help diagnose MPI configuration issues. The test reordering also appears sensible.

Consider adding error handling for the debugging commands to prevent workflow failures if they encounter issues:

-        sinfo -o "%n %e %m %a %c %C"
-        srun --mpi=list
+        sinfo -o "%n %e %m %a %c %C" || echo "sinfo command failed"
+        srun --mpi=list || echo "srun --mpi=list command failed"

This would ensure the workflow continues even if these diagnostic commands fail, while still providing valuable debugging information when they succeed.

Also applies to: 300-303

notebooks/3-hpc-job.ipynb (1)

1-501: Address static analysis issues in the notebook.

The static analysis tools have identified several issues that should be addressed:

  1. Unused import executorlib.SlurmJobExecutor on line 2
  2. Module-level imports not at the top of file (lines 16, 65-66, 93-94)
  3. Use contextlib.suppress(OSError) instead of try-except-pass (lines 71-74)
  4. Use single with statement for multiple contexts (lines 96-97)

For the unused import, apply this diff:

-from executorlib import SlurmJobExecutor

For the try-except block, consider:

-    try:
-        shutil.rmtree(cache_dir)
-    except OSError:
-        pass
+    with contextlib.suppress(OSError):
+        shutil.rmtree(cache_dir)
executorlib/task_scheduler/interactive/slurmspawner.py (1)

97-97: LGTM! Consider using list append for consistency.

The PMI mode parameter addition and conditional logic are correct. For consistency with the rest of the function, consider using the list append pattern:

-    if pmi_mode is not None:
-        command_prepend_lst += ["--mpi=" + pmi_mode]
+    if pmi_mode is not None:
+        command_prepend_lst += ["--mpi", pmi_mode]

This approach is more consistent with other arguments in the function and avoids string concatenation.

Also applies to: 111-111, 119-120

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5253722 and 3646610.

📒 Files selected for processing (20)
  • .github/workflows/pipeline.yml (2 hunks)
  • docs/installation.md (1 hunks)
  • executorlib/executor/flux.py (14 hunks)
  • executorlib/executor/single.py (1 hunks)
  • executorlib/executor/slurm.py (12 hunks)
  • executorlib/standalone/command.py (2 hunks)
  • executorlib/standalone/inputcheck.py (1 hunks)
  • executorlib/task_scheduler/file/shared.py (3 hunks)
  • executorlib/task_scheduler/file/task_scheduler.py (7 hunks)
  • executorlib/task_scheduler/interactive/fluxspawner.py (4 hunks)
  • executorlib/task_scheduler/interactive/slurmspawner.py (7 hunks)
  • notebooks/3-hpc-job.ipynb (1 hunks)
  • tests/test_fluxclusterexecutor.py (4 hunks)
  • tests/test_fluxjobexecutor.py (2 hunks)
  • tests/test_fluxpythonspawner.py (2 hunks)
  • tests/test_slurmclusterexecutor.py (5 hunks)
  • tests/test_slurmjobexecutor.py (1 hunks)
  • tests/test_standalone_command.py (1 hunks)
  • tests/test_standalone_inputcheck.py (2 hunks)
  • tests/test_standalone_interactive_backend.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/installation.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • executorlib/task_scheduler/file/shared.py
  • tests/test_standalone_command.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
executorlib/executor/flux.py (1)
executorlib/standalone/inputcheck.py (1)
  • check_pmi (104-116)
tests/test_standalone_inputcheck.py (1)
executorlib/standalone/inputcheck.py (1)
  • check_pmi_mode (149-153)
executorlib/task_scheduler/file/task_scheduler.py (1)
executorlib/standalone/inputcheck.py (1)
  • check_pmi_mode (149-153)
🪛 Ruff (0.12.2)
notebooks/3-hpc-job.ipynb

2-2: executorlib.SlurmJobExecutor imported but unused

Remove unused import: executorlib.SlurmJobExecutor

(F401)


16-16: Module level import not at top of file

(E402)


65-65: Module level import not at top of file

(E402)


66-66: Module level import not at top of file

(E402)


71-74: Use contextlib.suppress(OSError) instead of try-except-pass

Replace with contextlib.suppress(OSError)

(SIM105)


93-93: Module level import not at top of file

(E402)


94-94: Module level import not at top of file

(E402)


96-97: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
  • GitHub Check: unittest_openmpi (macos-latest, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
  • GitHub Check: unittest_win
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
  • GitHub Check: unittest_mpich (macos-latest, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
  • GitHub Check: notebooks_integration
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
  • GitHub Check: unittest_flux_openmpi
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
  • GitHub Check: unittest_slurm_mpich
  • GitHub Check: unittest_flux_mpich
🔇 Additional comments (34)
executorlib/executor/single.py (1)

332-332: Parameter rename aligns with codebase refactoring.

The change from flux_executor_pmi_mode to pmi_mode is consistent with the broader refactoring effort to standardize PMI interface parameter naming across the codebase.

tests/test_fluxjobexecutor.py (1)

93-93: Test parameter updates maintain coverage with new naming convention.

The parameter rename from flux_executor_pmi_mode to pmi_mode in both test methods correctly maintains test coverage for MPI functionality while adopting the standardized parameter naming.

Also applies to: 105-105

tests/test_fluxpythonspawner.py (1)

85-85: Dictionary key updates align with parameter standardization.

The parameter key changes from "flux_executor_pmi_mode" to "pmi_mode" in the executor_kwargs dictionaries correctly reflect the standardized parameter naming while maintaining test functionality.

Also applies to: 99-99

tests/test_slurmclusterexecutor.py (2)

29-29: SBATCH directive change improves MPI job specification.

The change from --cpus-per-task={{cores}} to --ntasks={{cores}} is appropriate for MPI jobs, as --ntasks correctly specifies the number of MPI processes rather than CPUs per task.


53-53: PMI mode parameter addition enhances MPI compatibility.

The addition of pmi_mode="pmi2" parameter to all SlurmClusterExecutor test cases aligns with the codebase standardization effort and provides explicit PMI interface specification for better MPI compatibility.

Also applies to: 67-67, 81-81, 100-100

tests/test_fluxclusterexecutor.py (1)

44-44: PMI mode parameter addition standardizes test configuration.

The addition of pmi_mode=pmi parameter to all FluxClusterExecutor test cases properly standardizes PMI interface specification across the test suite, using the environment-based configuration for runtime flexibility.

Also applies to: 58-58, 86-86, 105-105

tests/test_standalone_interactive_backend.py (1)

87-87: LGTM! Test correctly updated for PMI mode support.

The test properly validates that the pmi_mode="pmi2" parameter results in the --mpi=pmi2 flag being included in the generated Slurm command. This aligns well with the broader PMI mode standardization across the codebase.

Also applies to: 105-105

executorlib/task_scheduler/interactive/fluxspawner.py (1)

38-38: LGTM! Clean parameter standardization.

The renaming from flux_executor_pmi_mode to pmi_mode properly standardizes the PMI mode parameter across the codebase. The implementation correctly updates the constructor parameter, internal attribute, docstring, and usage in the bootup method.

Also applies to: 54-54, 69-69, 112-113

executorlib/standalone/inputcheck.py (1)

149-153: LGTM! Function properly generalized for unified PMI mode support.

The rename from check_flux_executor_pmi_mode to check_pmi_mode and the updated error message correctly reflect the broader applicability to both SLURM and Flux backends. This aligns well with the PMI mode standardization effort.

tests/test_standalone_inputcheck.py (1)

16-16: LGTM! Test properly updated to match function renaming.

The test correctly reflects the renaming from check_flux_executor_pmi_mode to check_pmi_mode and uses the new pmi_mode parameter. The test logic remains intact and continues to validate that specifying PMI mode raises a ValueError on local workstations.

Also applies to: 80-82

tests/test_slurmjobexecutor.py (3)

17-23: LGTM! Well-structured MPI test function.

The mpi_funct function correctly imports mpi4py locally and returns the expected tuple format for testing parallel execution.


30-30: Good addition of PMI mode configuration.

Adding --mpi=pmi2 to the slurm_cmd_args properly tests the new PMI mode functionality for serial execution.


38-46: Comprehensive parallel execution test.

The test correctly configures the executor for parallel execution with 2 cores and validates that the MPI function returns results from both ranks. The assertion properly checks for the expected list of tuples format.

notebooks/3-hpc-job.ipynb (1)

122-122: Parameter renaming is consistent and correct.

The change from flux_executor_pmi_mode to pmi_mode aligns with the broader standardization effort across the codebase.

executorlib/executor/slurm.py (6)

47-47: Good documentation of PMI mode parameter.

The docstring clearly explains the purpose of pmi_mode and mentions the OpenMPI v5 requirement for pmix.


95-95: Consistent parameter addition in constructor.

The pmi_mode parameter is properly added to the SlurmClusterExecutor constructor signature.


179-179: Parameter propagation is correct.

The pmi_mode parameter is properly passed through to the create_file_executor call.


285-285: Consistent implementation across executor classes.

The SlurmJobExecutor follows the same pattern as SlurmClusterExecutor for the pmi_mode parameter addition.


400-400: Factory function updated consistently.

The create_slurm_executor function signature properly includes the pmi_mode parameter.


454-454: Parameter correctly added to resource dictionary.

The pmi_mode is properly stored in the resource dictionary for downstream use.

executorlib/executor/flux.py (5)

46-46: Parameter renamed for consistency.

The standardization from flux_executor_pmi_mode to pmi_mode improves consistency across the codebase.


96-96: Constructor signature updated correctly.

The FluxJobExecutor constructor properly uses the new pmi_mode parameter name.


178-178: Parameter propagation maintained.

The pmi_mode parameter is correctly passed through to the create_flux_executor call.


473-473: check_pmi call updated correctly.

The function call now uses the standardized pmi_mode parameter name instead of the old flux_executor_pmi_mode.


482-482: Resource dictionary assignment updated.

The resource dictionary now correctly uses the pmi_mode key for consistency.

executorlib/standalone/command.py (3)

20-25: Good function signature extension.

The addition of backend and pmi_mode parameters with proper optional typing extends functionality while maintaining backward compatibility.


32-33: Clear documentation of new parameters.

The docstring properly documents the new parameters, including the backend options and PMI mode usage.


40-66: Well-structured backend handling logic.

The implementation correctly handles different backends:

  • Default behavior preserved for backend=None
  • Proper command construction for slurm with --mpi=<pmi_mode>
  • Correct flux command with -o pmi=<pmi_mode>
  • Appropriate error handling for invalid backends

The logic is clean and maintains the existing command structure while adding new functionality.

executorlib/task_scheduler/interactive/slurmspawner.py (3)

9-20: LGTM! Good defensive programming practice.

The changes safely check for the presence of SLURM environment variables before accessing them, preventing potential KeyError exceptions when these variables are not set.


34-34: LGTM! Proper parameter addition for PMI mode support.

The new pmi_mode parameter is correctly added with appropriate type annotation, documentation, and instance variable storage.

Also applies to: 48-48, 60-60


81-81: LGTM! Proper parameter propagation.

The pmi_mode parameter is correctly passed to the generate_slurm_command function.

executorlib/task_scheduler/file/task_scheduler.py (3)

10-10: LGTM! Import updated for unified PMI mode support.

The import change from check_flux_executor_pmi_mode to check_pmi_mode aligns with the broader refactoring to unify PMI mode handling across different backends.


37-37: LGTM! Proper PMI mode parameter integration.

The pmi_mode parameter is correctly added to the constructor with appropriate type annotation, documentation, and propagation to the process kwargs for task execution.

Also applies to: 52-52, 73-73


89-89: Approved: PMI mode conditional logic verified

The addition and propagation of the pmi_mode parameter are correct. The check_pmi_mode function (which raises ValueError when pmi_mode is non-null) is only invoked under backend is None, enforcing the “no PMI on a local workstation” rule as intended. No further changes required.

cache_directory=cache_directory,
max_cores=max_cores,
resource_dict=resource_dict,
pmi_mode=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Hardcoded None value in plot dependency graph path.

In the FluxClusterExecutor, when plot_dependency_graph is True, pmi_mode is hardcoded to None instead of using the constructor parameter. This may not be the intended behavior.

Apply this diff to fix the issue:

-                        pmi_mode=None,
+                        pmi_mode=pmi_mode,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pmi_mode=None,
pmi_mode=pmi_mode,
🤖 Prompt for AI Agents
In executorlib/executor/flux.py at line 390, the parameter pmi_mode is hardcoded
to None when plot_dependency_graph is True in FluxClusterExecutor. To fix this,
replace the hardcoded None with the pmi_mode value passed to the constructor so
that the actual intended mode is used instead of always None.

@jan-janssen jan-janssen merged commit 1c5b190 into main Jul 27, 2025
52 of 55 checks passed
@jan-janssen jan-janssen deleted the parallel_file_executor branch July 27, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants